AITopics | margin error

Implicit bias induced by gradient-based algorithms is essential to the generalization of overparameterized models, yet its mechanisms can be subtle. This work leverages the Normalized Steepest Descent} (NSD) framework to investigate how optimization geometry shapes solutions on multiclass separable data. We introduce NucGD, a geometry-aware optimizer designed to enforce low rank structures through nuclear norm constraints. Beyond the algorithm itself, we connect NucGD with emerging low-rank projection methods, providing a unified perspective. To enable scalable training, we derive an efficient SVD-free update rule via asynchronous power iteration. Furthermore, we empirically dissect the impact of stochastic optimization dynamics, characterizing how varying levels of gradient noise induced by mini-batch sampling and momentum modulate the convergence toward the expected maximum margin solutions.Our code is accessible at: https://github.com/Tsokarsic/observing-the-implicit-bias-on-multiclass-seperable-data.

artificial intelligence, machine learning, nucgd, (14 more...)

arXiv.org Machine Learning

2603.22824

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Sparsity of Data Representation of Optimal Kernel Machine and Leave-one-out Estimator

Neural Information Processing SystemsApr-6-2023, 17:03:44 GMT

Vapnik's result that the expectation of the generalisation error ofthe opti(cid:173) mal hyperplane is bounded by the expectation of the ratio of the number of support vectors to the number of training examples is extended to a broad class of kernel machines. The class includes Support Vector Ma(cid:173) chines for soft margin classification and regression, and Regularization Networks with a variety of kernels and cost functions. We show that key inequalities in Vapnik's result become equalities once "the classification error" is replaced by "the margin error", with the latter defined as an in(cid:173) stance with positive cost. In particular we show that expectations of the true margin error and the empirical margin error are equal, and that the sparse solutions for kernel machines are possible only if the cost function is "partially" insensitive.

data representation, kernel machine and leave-one-out estimator, optimal kernel machine, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Probabilities of Causation: Adequate Size of Experimental and Observational Samples

Li, Ang, Mao, Ruirui, Pearl, Judea

arXiv.org Artificial IntelligenceOct-10-2022

The probabilities of causation are commonly used to solve decision-making problems. Tian and Pearl derived sharp bounds for the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN) using experimental and observational data. The assumption is that one is in possession of a large enough sample to permit an accurate estimation of the experimental and observational distributions. In this study, we present a method for determining the sample size needed for such estimation, when a given confidence interval (CI) is specified. We further show by simulation that the proposed sample size delivered stable estimations of the bounds of PNS.

artificial intelligence, bernoulli, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.05027

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.32)
North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Cost-sensitive detection with variational autoencoders for environmental acoustic sensing

Li, Yunpeng, Kiskin, Ivan, Zilli, Davide, Sinka, Marianne, Chan, Henry, Willis, Kathy, Roberts, Stephen

arXiv.org Machine LearningDec-6-2017

Environmental acoustic sensing involves the retrieval and processing of audio signals to better understand our surroundings. While large-scale acoustic data make manual analysis infeasible, they provide a suitable playground for machine learning approaches. Most existing machine learning techniques developed for environmental acoustic sensing do not provide flexible control of the trade-off between the false positive rate and the false negative rate. This paper presents a cost-sensitive classification paradigm, in which the hyper-parameters of classifiers and the structure of variational autoencoders are selected in a principled Neyman-Pearson framework. We examine the performance of the proposed approach using a dataset from the HumBug project which aims to detect the presence of mosquitoes using sound collected by simple embedded devices.

artificial intelligence, machine learning, variational autoencoder, (14 more...)

arXiv.org Machine Learning

1712.02488

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.06)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

On Soft Power Diagrams

Borgwardt, Steffen

arXiv.org Machine LearningJun-24-2014

Noname manuscript No. (will be inserted by the editor) Abstract Many applications in data analysis begin with a set of points in a Euclidean space that is partitioned into clusters. Common tasks then are to devise a classifier deciding which of the clusters a new point is associated to, finding outliers with respect to the clusters, or identifying the type of clustering used for the partition. One of the common kinds of clusterings are (balanced) least-squares assignments with respect to a given set of sites. For these, there is a'separating power diagram' for which each cluster lies in its own cell. In the present paper, we aim for efficient algorithms for outlier detection and the computation of thresholds that measure how similar a clustering is to a leastsquares assignment for fixed sites. For this purpose, we devise a new model for the computation of a'soft power diagram', which allows a soft separation of the clusters with'point counting properties'; e.g. As our results hold for a more general non-convex model of free sites, we describe it and our proofs in this more general way. Its locally optimal solutions satisfy the aforementioned point counting properties. For our target applications that use fixed sites, our algorithms are efficiently solvable to global optimality by linear programming.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1307.3949

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

Panagiotakopoulos, Constantinos, Tsampouka, Petroula

arXiv.org Artificial IntelligenceJan-25-2014

We reconsider the stochastic (sub)gradient approach to the unconstrained primal L1-SVM optimization. We observe that if the learning rate is inversely proportional to the number of steps, i.e., the number of times any training pattern is presented to the algorithm, the update rule may be transformed into the one of the classical perceptron with margin in which the margin threshold increases linearly with the number of steps. Moreover, if we cycle repeatedly through the possibly randomly permuted training set the dual variables defined naturally via the expansion of the weight vector as a linear combination of the patterns on which margin errors were made are shown to obey at the end of each complete cycle automatically the box constraints arising in dual optimization. This renders the dual Lagrangian a running lower bound on the primal objective tending to it at the optimum and makes available an upper bound on the relative accuracy achieved which provides a meaningful stopping criterion. In addition, we propose a mechanism of presenting the same pattern repeatedly to the algorithm which maintains the above properties. Finally, we give experimental evidence that algorithms constructed along these lines exhibit a considerably improved performance.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1304.6383

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

Ranking with Large Margin Principle: Two Approaches

Shashua, Amnon, Levin, Anat

Neural Information Processing SystemsDec-31-2003

We discuss the problem of ranking k instances with the use of a "large margin" principle. We introduce two main approaches: the first is the "fixed margin" policy in which the margin of the closest neighboring classes is being maximized - which turns out to be a direct generalization of SVM to ranking learning. The second approach allows for k - 1 different margins where the sum of margins is maximized. This approach is shown to reduce to lI-SVM when the number of classes k 2. Both approaches are optimal in size of 21 where I is the total number of training examples. Experiments performed on visual classification and "collaborative filtering" show that both approaches outperform existing ordinal regression algorithms applied for ranking and multi-class SVM applied to general multi-class classification.

hyperplane, learning, margin error, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > United States > Wisconsin (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Ranking with Large Margin Principle: Two Approaches

Shashua, Amnon, Levin, Anat

Neural Information Processing SystemsDec-31-2003

We discuss the problem of ranking k instances with the use of a "large margin" principle. We introduce two main approaches: the first is the "fixed margin" policy in which the margin of the closest neighboring classes is being maximized - which turns out to be a direct generalization of SVM to ranking learning. The second approach allows for k - 1 different margins where the sum of margins is maximized. This approach is shown to reduce to lI-SVM when the number of classes k 2. Both approaches are optimal in size of 21 where I is the total number of training examples. Experiments performed on visual classification and "collaborative filtering" show that both approaches outperform existing ordinal regression algorithms applied for ranking and multi-class SVM applied to general multi-class classification.

hyperplane, learning, margin error, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
North America > United States > Wisconsin (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Ranking with Large Margin Principle: Two Approaches

Shashua, Amnon, Levin, Anat

Neural Information Processing SystemsDec-31-2003

We discuss the problem of ranking k instances with the use of a "large margin" principle. We introduce two main approaches: the first is the "fixed margin" policy in which the margin of the closest neighboring classes is being maximized - which turns out to be a direct generalization ofSVM to ranking learning. The second approach allows for k - 1 different margins where the sum of margins is maximized. This approach is shown to reduce to lI-SVM when the number of classes k 2. Both approaches are optimal in size of 21 where I is the total number of training examples. Experiments performed on visual classification and "collaborative filtering"show that both approaches outperform existing ordinal regression algorithms applied for ranking and multi-class SVM applied to general multi-class classification. 1 Introduction In this paper we investigate the problem of inductive learning from the point of view of predicting variables of ordinal scale [3, 7,5], a setting referred to as ranking learning or ordinal regression. We consider the problem of applying the large margin principle used in Support Vector methods [12, 1] to the ordinal regression problem while maintaining an (optimal) problem size linear in the number of training examples.

artificial intelligence, inductive learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.15)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.38)

Add feedback

Sparsity of Data Representation of Optimal Kernel Machine and Leave-one-out Estimator

Kowalczyk, Adam

Neural Information Processing SystemsDec-31-2001

Vapnik's result that the expectation of the generalisation error ofthe optimal hyperplane is bounded by the expectation of the ratio of the number of support vectors to the number of training examples is extended to a broad class of kernel machines. The class includes Support Vector Machines for soft margin classification and regression, and Regularization Networks with a variety of kernels and cost functions. We show that key inequalities in Vapnik's result become equalities once "the classification error" is replaced by "the margin error", with the latter defined as an instance with positive cost. In particular we show that expectations of the true margin error and the empirical margin error are equal, and that the sparse solutions for kernel machines are possible only if the cost function is "partially" insensitive. 1 Introduction Minimization of regularized risk is a backbone of several recent advances in machine learning, including Support Vector Machines (SVM) [13], Regularization Networks (RN) [5] or Gaussian Processes [15]. Such a machine is typically implemented as a weighted sum of a kernel function evaluated for pairs composed of a data vector in question and a number of selected training vectors, so called support vectors.

assumption, cost function, support vector, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > New York (0.05)
Oceania > Australia (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Filters

Collaborating Authors

margin error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Towards The Implicit Bias on Multiclass Separable Data Under Norm Constraints

Sparsity of Data Representation of Optimal Kernel Machine and Leave-one-out Estimator

Probabilities of Causation: Adequate Size of Experimental and Observational Samples

Cost-sensitive detection with variational autoencoders for environmental acoustic sensing

On Soft Power Diagrams

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

Ranking with Large Margin Principle: Two Approaches

Ranking with Large Margin Principle: Two Approaches

Ranking with Large Margin Principle: Two Approaches

Sparsity of Data Representation of Optimal Kernel Machine and Leave-one-out Estimator